AITopics | model confidence

Collaborating Authors

model confidence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d826f5aadb26db488b8686097ceea2d1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 09:48:49 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > France (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring Geometry of Blind Spots in Vision Models

Neural Information Processing SystemsFeb-15-2026, 20:43:26 GMT

We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space using orthogonal components of the local gradients.

artificial intelligence, machine learning, source image, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ff887781480973bd3cb6026feb378d1e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 05:00:39 GMT

adversarial training, non-robust model, prediction, (14 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang

Neural Information Processing SystemsFeb-12-2026, 11:26:13 GMT

Neural Information Processing Systems http://nips.cc/

attention model, prediction, variance, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry:

Health & Medicine > Therapeutic Area (0.72)
Health & Medicine > Health Care Technology > Medical Record (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Perplexity Cannot Always Tell Right from Wrong

Veličković, Petar, Barbero, Federico, Perivolaropoulos, Christos, Osindero, Simon, Pascanu, Razvan

arXiv.org Machine LearningFeb-2-2026

Perplexity -- a function measuring a model's overall level of "surprise" when encountering a particular output -- has gained significant traction in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently -- a necessary pre-requisite for strong generalisation -- it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always select for the more accurate model -- rather, any increase in model confidence must be accompanied by a commensurate rise in accuracy for the new model to be selected.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2601.2295

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Expert load matters: operating networks at high accuracy and low manual effort

Neural Information Processing SystemsDec-24-2025, 13:28:32 GMT

In human-AI collaboration systems for critical applications, in order to ensure minimal error, users should set an operating point based on model confidence to determine when the decision should be delegated to human experts. Samples for which model confidence is lower than the operating point would be manually analysed by experts to avoid mistakes.Such systems can become truly useful only if they consider two aspects: models should be confident only for samples for which they are accurate, and the number of samples delegated to experts should be minimized.The latter aspect is especially crucial for applications where available expert time is limited and expensive, such as healthcare. The trade-off between the model accuracy and the number of samples delegated to experts can be represented by a curve that is similar to an ROC curve, which we refer to as confidence operating characteristic (COC) curve. In this paper, we argue that deep neural networks should be trained by taking into account both accuracy and expert load and, to that end, propose a new complementary loss function for classification that maximizes the area under this COC curve.This promotes simultaneously the increase in network accuracy and the reduction in number of samples delegated to humans.We perform experiments on multiple computer vision and medical image datasets for classification.Our results demonstrate that the proposed loss improves classification accuracy and delegates less number of decisions to experts, achieves better out-of-distribution samples detection and on par calibration performance compared to existing loss functions.

accuracy and low manual effort, expert load matter, name change, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Industry: Health & Medicine (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Uncertainty-Aware Attention for Reliable Interpretation and Prediction

Jay Heo, Hae Beom Lee, Saehoon Kim, Juho Lee, Kwang Joon Kim, Eunho Yang, Sung Ju Hwang

Neural Information Processing SystemsNov-20-2025, 15:12:37 GMT

Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with "I don't know"

machine learning, natural language, prediction, (21 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry:

Health & Medicine > Therapeutic Area (0.72)
Health & Medicine > Health Care Technology > Medical Record (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Does In-IDE Calibration of Large Language Models work at Scale?

Koohestani, Roham, Sergeyuk, Agnia, Gros, David, Spiess, Claudio, Titov, Sergey, Devanbu, Prem, Izadi, Maliheh

arXiv.org Artificial IntelligenceOct-28-2025

The introduction of large language models into integrated development environments (IDEs) is revolutionizing software engineering, yet it poses challenges to the usefulness and reliability of Artificial Intelligence-generated code. Post-hoc calibration of internal model confidences aims to align probabilities with an acceptability measure. Prior work suggests calibration can improve alignment, but at-scale evidence is limited. In this work, we investigate the feasibility of applying calibration of code models to an in-IDE context. We study two aspects of the problem: (1) the technical method for implementing confidence calibration and improving the reliability of code generation models, and (2) the human-centered design principles for effectively communicating reliability signal to developers. First, we develop a scalable and flexible calibration framework which can be used to obtain calibration weights for open-source models using any dataset, and evaluate whether calibrators improve the alignment between model confidence and developer acceptance behavior. Through a large-scale analysis of over 24 million real-world developer interactions across multiple programming languages, we find that a general, post-hoc calibration model based on Platt-scaling does not, on average, improve the reliability of model confidence signals. We also find that while dynamically personalizing calibration to individual users can be effective, its effectiveness is highly dependent on the volume of user interaction data. Second, we conduct a multi-phase design study with 3 expert designers and 153 professional developers, combining scenario-based design, semi-structured interviews, and survey validation, revealing a clear preference for presenting reliability signals via non-numerical, color-coded indicators within the in-editor code generation workflow.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.22614

Country:

Europe (0.68)
North America > United States > California > Yolo County > Davis (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models

Liu, Xinyi, Wang, Weiguang, He, Hangfeng

arXiv.org Artificial IntelligenceOct-10-2025

With the growing adoption of Large Language Models (LLMs) for open-ended tasks, accurately assessing epistemic uncertainty, which reflects a model's lack of knowledge, has become crucial to ensuring reliable outcomes. However, quantifying epistemic uncertainty in such tasks is challenging due to the presence of aleatoric uncertainty, which arises from multiple valid answers. While bias can introduce noise into epistemic uncertainty estimation, it may also reduce noise from aleatoric uncertainty. To investigate this trade-off, we conduct experiments on Visual Question Answering (VQA) tasks and find that mitigating prompt-introduced bias improves uncertainty quantification in GPT-4o. Building on prior work showing that LLMs tend to copy input information when model confidence is low, we further analyze how these prompt biases affect measured epistemic and aleatoric uncertainty across varying bias-free confidence levels with GPT-4o and Qwen2-VL. We find that all considered biases have greater effects in both uncertainties when bias-free model confidence is lower. Moreover, lower bias-free model confidence is associated with greater bias-induced underestimation of epistemic uncertainty, resulting in overconfident estimates, whereas it has no significant effect on the direction of bias effect in aleatoric uncertainty estimation. These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.16724

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: